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Abstract 

Consider the following file caching problem: in response to a sequence 
of requests for files, where each file has a specified size and retrieval cost, 
maintain a cache of files of total size at most some specified k so as to 
minimize the total retrieval cost. Specifically, when a requested file is not 
in the cache, bring it into the cache and pay the retrieval cost, and remove 
other files from the cache so that the total size of files remaining in the 
cache is at most k. This problem generalizes previous paging and caching 
problems by allowing objects of arbitrary size and cost, both important 
attributes when caching files for world-wide-web browsers, servers, and 
proxies. 

We give a simple deterministic on-line algorithm that generalizes many 
well-known paging and weighted-caching strategies, including least-recently- 
used, first-in-first-out, flush-when-full, and the balance algorithm. On 
any request sequence, the total cost incurred by the algorithm is at most 
k/(k — h + 1) times the minimum possible using a cache of size h < k. 

For any algorithm satisfying the latter bound, we show it is also the 
case that for most choices of k, the retrieval cost is either insignificant 
or at most a constant (independent of k) times the optimum. This helps 
explain why competitive ratios of many on-line paging algorithms have 
been typically observed to be constant in practice. 
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1 Background and Statement of Results 

The file caching problem is as follows. Given a cache with a specified size k 
(a positive integer) and a sequence of requests to files, where each file has a 
specified size (a positive integer) and a specified retrieval cost (a non-negative 
number), maintain files in the cache to satisfy the requests while minimizing 
the total retrieval cost. Specifically, when a requested file is not in the cache, 
bring it into the cache, paying the retrieval cost of the file, and remove other 
files from the cache so that the total size of files remaining in the cache is at 
most k. 
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Following Sleator and Tarjan p7| , we say a file caching algorithm is c(h, k)- 
competitive if on any sequence the total retrieval cost incurred by the algorithm 
using a cache of size k is at most c(h, k) times the minimum possible cost using 
a cache of size h. An algorithm is on-line if its response to a request does not 
depend on later requests in the sequence. 

Uniform sizes, uniform costs. With the restriction that all file sizes and 
costs are the same, the problem is called paging. Paging has been extensively 
studied. In a seminal paper, Sleator and Tarjan | fl7| showed that least-recently- 
used and a number of other deterministic on-line paging strategies are k _^ +1 - 
competitive. Sleator and Tarjan also showed that this performance guarantee 
is the best possible for any deterministic on-line algorithm. 

A simple randomized paging algorithm called the marking algorithm was 
shown to be 2 In fc-competitive by Fiat et al. ||. An optimal In ^competitive 
randomized paging algorithm was given by McGeoch and Sleator |l6| . In |2(| , 
deterministic paging strategies were shown to be loosely 0(ln /^-competitive. 
This means roughly that for any sequence, for most values of fc, the fault rate 
of the algorithm using a cache of size k is either insignificant or the algorithm 
is 0(ln fc)-competitive versus the optimum algorithm using a cache of size k. 
Similarly, the marking algorithm was shown to be loosely (21nlnfc + 0(1))- 
competitive. 

Uniform sizes, arbitrary costs. The special case of file caching when 
all file sizes are the same is called weighted caching. For weighted caching, 
Chrobak, Karloff, Payne and Vishwanathan Q| showed that an algorithm called 
the "balance" algorithm is ^-competitive. Subsequently in |^(| a generalization 
of that algorithm called the "greedy-dual" algorithm was shown to be 
competitive. The greedy-dual algorithm generalizes many well-known paging 
and weighted-caching strategies, including least-recently-used, first-in- first-out, 
flush-when-full, and the balance algorithm. 

Arbitrary sizes, cost = 1 or cost = size. Motivated by the importance of 
file size in caching for world- wide- web applications (see comment below), Irani 
considered two special cases of file caching: when the costs are either all equal 
(the goal is to minimize the number of retrievals), and when each cost equals the 
file size (the goal is to minimize the total number of bytes retrieved). For these 
two cases, Irani gave 0(log 2 fc)-competitive randomized on-line algorithms. 

Comment: the importance of sizes and costs. File caching is important 
for world-wide-web applications. For instance, in browsers and proxy servers 
remote files are cached locally to avoid remote retrieval. In web servers, disk 
files are cached in fast memory to speed response time. As Irani points out 
(see ^ and references therein), file size is an important consideration; caching 
policies adapted from memory management applications that don't take size 
into account do not work well in practice. 
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Algorithm Landlord 

Maintain a real value credit [/] with each file / in the cache. 
When a file g is requested: 

1. if g is not in the cache then 

2. until there is room for g in the cache: 

3. For each file / in the cache, decrease credit[/] by A ■ size[/], 

4. where A = min /ecache credit [/] /size [/]. 

5. Evict from the cache any subset of the files / such that credit [/] = 0. 

6. Bring g into the cache and set credit [g] <— cost(<;). 

7. else Reset credit[g] to any value between its current value and cost(g). 



Figure f : The on-line file caching algorithm Landlord. Credit is given to each 
file when it is requested. "Rent" is charged to each file in the cache in proportion 
to its size. Files are evicted as they run out of credit. Step 7 is not necessary for 
the worst-case analysis, but it is likely to be important in practice: raising the 
credit as much as possible in step 7 generalizes the least-recently-used paging 
strategy; not raising at all generalizes the first-in-first-out paging strategy. 

Allowing arbitrary costs is likely to be important as well. In many cases, 
the cost (e.g., latency, total transmission time, or network resources used) will 
neither be uniform across files nor proportional solely to the size. For instance, 
the cost to retrieve a remote file can depend on the distance the file must travel in 
the network. Even accounting for distance, the cost need not be proportional to 
the size, e.g., because of economies of scale in routing files through the network. 
Further, in some applications it makes sense to assign different kinds of costs 
to different kinds of files. For instance, some kinds of documents are displayed 
by web browsers as they are received, so that the effective delay for the user 
is determined more by the latency than the total transmission time. Other 
documents must be fully transmitted before becoming useful. Both kinds of 
files can be present in a cache. In all these cases, assigning uniform costs or 
assigning every file's cost to be its size is not ideal.n 

This paper: arbitrary sizes, arbitrary costs. This paper presents a 
simple deterministic on-line algorithm called Landlord (shown in Figure [j]). 
Landlord handles the problem of file caching with arbitrary costs and integer 
sizes. The first result is: 

1 In many applications the actual cost to access a file may vary with time; that issue is not 
considered here, nor is the issue of cache consistency (i.e., if the remote file changes at the 
source, how does the local cache get updated? The simplest adaptation of the model here 
would be to assume that a changed file is treated as a new file; this would require that the 
local cache strategy learn about the change in some way). Finally, the focus here is on simple 
local caching strategies, rather than distributed strategies in which servers cooperate to cache 
pages across a network (see e.g. uW)- 
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Theorem 1 Landlord is k _^ +1 - competitive for file caching. 

This performance guarantee is the best possible for any deterministic on-line 
algorithm.^ File caching is not a special case of the fc-server problem, although 
weighted caching is a special case of both file caching and the fc-server problem. 

LANDLORD is a generalization of the greedy-dual algorithm |2(| for weighted 
caching, which in turn generalizes least-recently-used and first-in-first-out (pag- 
ing strategies) , as well as the balance algorithm for weighted caching. The analy- 
sis uses the potential function $ = (h— 1) X^/six cre dit [/] + fc y^ cr >PT cos t(/) — 
credit [/] . The analysis is simpler than that of ppf for the special case of weighted 
caching. 

In an independent work ||, Cao and Irani showed that LANDLORD (with 
step 7 raising credit [g] as much as possible) is fc-competitive. They also gave 
empirical evidence that the algorithm performs well in practice. 

This paper: (e, <5)-loosely c-competitiveness. In practice it has been ob- 
served that on "typical" request sequences, paging algorithms such as least- 
recently-used, using a cache of size fc, incur a cost within a small constant factor 
(independent of fc) times the minimum possible using a cache of size fc pQ |. This 
is in contrast to the theoretically optimal competitive ratio of fc. A number of 
refinements of competitive analysis have been proposed to try to understand 
the relevant factors. Borodin, Irani, Raghavan, and Schieber 0, in order to 
model locality of reference, proposed the access-graph model which restricts the 
request sequences to paths in a given graph (related papers include [|[ |Io|, fj}). 
Karlin, Phillips, and Raghavan Jl^j proposed a variant in which the graph is 
a Markov chain (i.e. the edges of the graph are assigned probabilities, and the 
request sequence corresponds to a random walk) (see also [Q). Koutsoupias 
and Papadimitriou |]l3f proposed the comparative ratio (for comparing classes 
of on-line algorithms) and the diffuse adversary model (in which the adversary 
chooses a probability distribution, rather than a sequence, from some restricted 
class of distributions). 

In this paper we introduce a refinement of the aforementioned loosely com- 
petitive ratio |pp|| (another previously proposed alternative model). The model 
is motivated by two observations. First, in practice, if the retrieval cost is low 
enough in an absolute sense, the competitive ratio is of no concern. For instance, 
in paging, if the fault rate drops much below 

time to execute a machine instruction 
time to retrieve a page from disk 

then the total time to handle page faults is less than the time to execute instruc- 
tions, so that page faults cease to be the limiting factor in the execution time. 
Similar considerations hold in other settings such as file caching. To formalize 

2 Manasse, McGeoch, and Sleator ]l5| show that no deterministic on-line algorithm for the 
well-known fc-server problem on any metric space of more than fc points is better than fc _^ +1 - 
competitive. This implies that, at least for any special case when all sizes are 1 (i.e. weighted 
caching), no deterministic on-line algorithm for file caching is better than -competitive. 
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this, we introduce a parameter e > 0, and say that "low enough" for a request 
sequence r means "no more than e times the sum of the retrieval costs" (the sum 
being taken over all requests). This is tantamount to assuming that handling a 
file of cost cost(/) requires overhead of ecost(/) whether it is retrieved or not. 

Second, in many circumstances, we do not expect the input sequences to be 
adversarially tailored for our particular cache size k. To model this, rather than 
somehow restricting the input sequences, we allow all input sequences but for 
each, we consider what happens at a typical cache size k. Formally, for each 
sequence, we consider all the values of k in any range {1,2,..., n}, and we ask 
that the competitive ratio be at most some constant c for at least (1 — 6)n of 
these values, where 6 is a parameter to the model. 

Our model, which we dub "loose competitiveness" , combines both these 
ideas: 

Definition 1 A file caching algorithm A is (e, 5, n)-loosely c-competitive if, for 
any request sequence r, at least (1 — 5)n of the values k e {1, 2, . . . , n} satisfy 

cost (A, k, r) < max jc • cost(OPT, k, r), e • ^ cost(/)|. (1) 

A is (e, <5)-loosely c-competitive if A is (e,S,n)-loosely c-competitive for all pos- 
itive integers n. 

Here cost(A, k, r) denotes the cost incurred by algorithm A using a cache of size 
k on sequence r. Opt denotes the optimal algorithm, so that cost(OPT, k, r) 
is the minimum possible cost to handle the sequence r using a cache of size k. 
The sum on the right ranges over all requests in r, so that if a file is requested 
more than once, its cost is counted for each request. 

Since the standard competitive ratio grows with k, it is not a-priori clear that 
any on-line algorithm can be (e, <5)-loosely c-competitive for any c that depends 
only on e and 6. Our second result is the following. 

Theorem 2 Every k _ k h+1 -competitive algorithm is (e,S)-loosely c-competitive 
for any < e,S <1 and c = (e/6) Ln(e/e) = 0((l/6) log(l/e)). 

(Throughout the paper e is the base of the natural logarithm.) The inter- 
pretation is that for most choices of k, the retrieval cost is either insignificant 
or the competitive ratio is constant. 

This result supports the intuition that it is meaningful to compare an algo- 
rithm against a "handicapped" optimal algorithm (most competitive analyses 
consider the case h — k). A strong performance guarantee, even against a 
handicapped optimal algorithm, may be as (or more) meaningful than a weak 
performance guarantee against a non-handicapped adversary. 



Our proof is similar in spirit to the proof in 1 20 for the special case of paging, 
but the proof here is simpler, more general, and gives a stronger result. 
Of course the following corollary is immediate: 

Corollary 1 Landlord is (e, 6) -loosely c-competitive for c — (e/6) ln(e/e) = 
0((l/<J)log(l/e)). 
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This helps explain why the competitive ratios of the many on-line algorithms 
that Landlord generalizes are typically observed to be constant. 
For completeness, we also consider randomized algorithms: 

Theorem 3 Let < e,5 < 1. Any a + f3 In fc „^ +1 -competitive algorithm is 
(e, 5) -loosely c- competitive for c = ea+e/3hi[(l/5) ln(e/e)] = 0(log[(l/<5) log(l/e)]). 

It is known (e.g. Jll| |ll|) that the marking algorithm (a randomized on-line 
algorithm) is (1+2 In ^^-competitive for paging and (1+2 In fc)-competitive for 
h = k. It follows by algebra that the marking algorithm is 1 + 2 ln2 + 2 In k _ k j l+1 - 
compctitive. Although a stronger result can probably be shown, this simple one 
and Theorem ^ imply the following corollary: 

Corollary 2 The marking algorithm is (e, S)-loosely c-competitive for paging 
fore = e + 2eln2 + 2eln[(l/<5)ln(e/e)] = 0(log[(l/<5) log(l/e)]). 

Finally, we show Theorem ^| and Corollary |l| are tight up to a constant factor: 

Theorem 4 For any e and S with < e < 1 and < S < 1/2, Landlord is 
not (e, S) -loosely c-competitive for c = {1/85) log 2 (l/2e) = <d((l/5) log(l/e)). 

2 Analysis of LANDLORD. 

Theorem 1 Landlord is -competitive for file caching. 

Proof: Define potential function 

$ = (h - 1) • credit [/] + k ■ cost (f) - credit [/]. 

/gll /eOPT 

Here ll denotes the cache of Landlord; opt denotes the cache of Opt. For 
/ ^ LL, by convention credit[/] — 0. Before the first request of a sequence, when 
both caches are empty, $ is zero. After all requests have been processed (and 
in fact at all times), $ > 0. Below we show that at each request: 

• if Opt retrieves a file of cost c, <& increases by at most kc; 

• if Landlord retrieves a file of cost c, <& decreases by at least (k — h + l)c; 

• at all other times <!> does not increase. 

These facts imply that the cost incurred by Landlord is bounded by k/(k — 
h + 1) times the cost incurred by Opt. 

The actions affecting $ following each request can be broken down into a 
sequence of steps, with each step being one of the following. We analyze the 
effect of each step on $. 

• Opt evicts a file /. 

Since crcdit[/] < cost(/), $ cannot increase. 
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• Opt retrieves a file g. 

In this step Opt pays the retrieval cost cost(g). 

Since credit [g] > 0, $ can increase by at most k ■ cost(g). 

• Landlord decreases credit [/] for all / e ll. 

Since the decrease of a given credit [/] is Asize(/), the net decrease in $ 
is A times 

(h — 1) size(LL) - A:size(oPT n ll), 
where size(X) denotes Ylfex size (f)- 

When this step occurs, we can assume that the requested file g has already 
been retrieved by Opt but is not in ll. Thus, size(OPTiHLL) < h-size(g). 

Further, there is not room for g in ll, so that size(LL) > k — sizc(<?) + 1 
(recall that sizes are assumed to be integers). Thus the decrease in the 
potential function is at least A times 

(h - l)(k - sizc(#) + 1) - k(h - sue(g)). 

Since size(g) > 1 and k > h, this is at least (h— l)(k— 1 + 1) — k(h — 1) = 0. 

• Landlord evicts a file /. 

LANDLORD only evicts / when credit [/] = 0. Thus, <1> is unchanged. 

• Landlord retrieves the requested file g and sets credit [g] to cost(g). 

In this step Landlord pays the retrieval cost cost(<?). 

Since g was not previously in the cache (and credit [g] was zero), and 
because we can assume that g £ opt, $ decreases by —(h— l)cost(<ji) + 
k cost(g) = (k — h + l)cost(<7). 

• Landlord resets credit^] between its current value and cost(g). 

Again, we can assume g £ opt. If credit [g] changes, it can only increase. 
In this case, since (h — 1) < k, $ decreases. o 



3 Upper Bounds on Loose Competitiveness. 

The following technical lemma is at the core of Theorems || and |[ 

Lemma 1 Let A be any r(k, k — h)- competitive algorithm for some function t 
that is increasing w.r.t. k and decreasing with respect to k — h. 

For any b, e, 5, n > (n an integer, b < Sn), A is (e, 5, n)-loosely c-competitive 

for 

c= T {n,b)e-^l ( - Sn - h - 1 \ 
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Proof: Fix any request sequence r and b, e,6,n > 0. Define c as above. Say a 
value k G {1, 2, . . . , n} is bad if 

cost (.A, k, r) > max {c • cost(OPT, fc, r), e • J^/er cost (/)}- (2) 

We will show that at most 8n values are bad. 

Denote the bad values (in increasing order) ko, fci, . . . , ks- The form of 
the argument is this: on the one hand, we show that cost (A, hi, r) decreases 
exponentially with i; on the other hand, we know that (for each i) cost(j4, ki, r) 
is not too small (e.g. smaller than e times cost(A, ko, r)); together, these will 
imply that B cannot be too large. 

From the sequence of bad values, select the subsequence ko, km , k2m > • • • 
and denote it k' ,k'i,..., k' B , . The properties of this sequence that we use are 
K - fc-_! > b for each i and B' > B/{b + 1). 

Since A is r(fc, k — /i)-competitive, choosing k = k[ and h = k' i _ 1 shows that 

cost(A,k-,r) < T(k' i ,k' i -k' i _ 1 )cost(OPT,k' i _ 1 ,r). 

From the first term in the maximum in (||), cost(A, k' i _ 1 , r) > c-cost(OPT, k' i _ 1 ,r). 
The condition on r implies r(fc.-, k[ — &i_i) < t{hi, b). Thus, 

cost(A, fc-,r) < (r(n, b)/c)co$t(A, k' l _ 1 ,r). 

Inductively, 

cost(A, k' B ,,r) < (r(n, 6)/c) B 'cost(A, k' ,r). 

That is, for every b bad values, cost(A, fc«,r) decreases by a factor of r(n, &)/c. 
The rest is algebra. As noted before, cost (A, k' B ,, r) > ecost(A, k' , r). Combin- 
ing with the above inequality gives (r(n, b)/c) B > e, which (by substituting for 
c and simplifying) gives 

B' < 5n/(b+l) - 1. 

Combining this with B' > B/(b + 1) gives £? + 1 < <5n. That is, there are fewer 
than Sn bad values. o 

Theorem 2 Every k _\ +1 -competitive algorithm is (e, 5)-loosely c- competitive 
for any < e,S <1 and c = (e/5)ln(e/e) = 0((l/5) log(l/e)). 

Proof: Fix any e, <$, n > (n integer). We need to show the algorithm 
is (e, S, n)-loosely c-competitive. Let r(k,k — h) = k/(k — h + 1) and b 

i — 1. If b < 0, then an easy calculation shows c > n, and since the 

algorithm is fc-competitive, the conclusion holds trivially. 

Otherwise (6 > 0), we apply the technical lemma. With this choice of b, 
e -(6+i)/(5n— 6-1) _ g) so c _ er ( n) ^), p or this r and fo, r(n, 6) simplifies to 
(l/(5)ln(e/e). o 

Theorem 3 Let < e,5 < 1. Any a + /3 In fc _^ +1 -competitive algorithm is 
(e, 5) -loosely c-competitive for c = ea+e/31n[(l/5) ln(e/e)] = 0(log[(l/<$) log(l/e)]). 
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Algorithm Landlord for the special case of paging 

Maintain a value credit [/] £ [0, 1] with each item / in the cache. 
When an item g is requested: 

1. if g is not in the cache then 

2. if there are no 0-credit items in the cache, 

3. then decrease all credits by the minimum credit. 

4. Evict from the cache any subset of the items / such that credit [/] = 0. 

5. Bring g into the cache and set credit [g] <— 1. 

6. else Reset credit [g] to any value between its current value and 1. 



Figure 2: Landlord as it specializes for paging. To get Lru, reset credit[g] to 1 
in line 6 and evict the single least-recently-requested 0-credit item in line 4. To 
get Fwf, leave credit [g] unchanged in line 6 and evict all 0-credit items in line 
4. To get Fifo, leave credit[<?] unchanged in line 6 and evict the single 0-credit 
item that has been in the cache the longest in line 4. All of these strategies 
maintain credits in {0, 1}. 

Proof: Much as in the preceding proof, taker(fc, k— h) = a+[3 \n(k/(k — h+l)) 
and b = <5n/ln(e/e) — 1. If b < 0, then an easy calculation shows c > a + /31nn, 
so the conclusion holds trivially. 

Otherwise (b > 0), we apply the technical lemma. With this choice of b, 
e -(6+i)/(4n-6-i) = e , so c — er(n, b). For this r and b, r(n,6) simplifies to 
a + /31n[(l/<5)ln(e/e)]. o 

4 Lower Bound on Loose Competitiveness. 

In this section we show the following theorem. 

Theorem 4 For any e and 8 with < e < 1 and < 8 < 1/2, LANDLORD is 
not (e, 8) -loosely c- competitive for c = (l/8<5) log 2 (l/2e) = 9((l/<5) log(l/e)). 

For the proof we adapt an unpublished result from ]l8| . We consider the 
least-recently-used (Lru) and flush-when-full (Fwf) paging strategies. (Recall 
that paging is the special case of file caching when each size and retrieval cost is 
1.) We assume the reader is familiar with Fwf and LRU, but just in case here 
is a brief description of each. When an item not in the cache is requested and 
the cache is full, Fwf empties the cache completely. In contrast, LRU evicts the 
single item that was least recently requested. Figure ^| describes how each is a 
special case of Landlord. 

We give the desired lower bound for Fwf. Since Landlord generalizes 
Fwf, the result follows. This appears unsatisfactory, because it would be natu- 
ral to restrict Landlord (in line 5) to evict only one file at a time (unlike Fwf). 
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However, the same lower bound proof applies even to a version of Landlord 
that has this behavior. (We discuss this more after the proof.) Interestingly, 
the lower bound does not apply to Lru. In fact, for the sequences constructed 
for the lower bound, Lru is a near-optimal algorithm. 

The proof uses the concept of k-phases from the standard competitive anal- 
ysis framework. We define fc-phases as follows. Let s — siS2...s n be any 
sequence of requests. Consider running Fwf with a cache of size fc on the se- 
quence, and break the sequence into pieces (called phases or k-phases) so that 
each piece starts with a request that causes Fwf to flush its cache. Thus, each 
phase (except the last) contains requests to k distinct items, and each phase 
(except the first) starts with a request to an item not requested in the previous 
phase. 

The adversarial sequence. Fix any e, 6 > with e < 1 and S < 1/2. Define 
(with foresight) c as in the theorem and let n be some sufficiently large integer. 
We will show that Landlord is not (e, 5, n)-loosely competitive. Define fco = 
[(1 — 6)n~\ . We will focus on k in the range fco, . . . , n, inductively constructing 
a sequence s such that each cache size in this range is bad for Fwf in the 
sense of Condition (||). That is, for each such fc, we will show cost (Fwf, k,s) > 
max{ccost(oPT, fc, s), e\s\}. The number of fc's in the range is 1 + n — fco > Sn, 
so this will show the desired result. 

In the construction we will build sequences that contain a special request 
"x" . Each occurrence of x represents a request to an item that is not requested 
anywhere else (so all occurrences refer to different items). 

For the base case of the induction, we let So be a sequence containing fco 
special requests x. For the inductive step we do the following. For i = 0, 1,2, . . . 
let fcj + i = [fco(l + l/(4c)) J ] and let Si+i be obtained from Si by choosing any 
fci+i — ki special requests x (including the first one) in Si, replacing each unchosen 
x with a regular request not occurring elsewhere in Si, and then appending two 
copies of the modified string. 

For example, if fco — 4 and k\ — 5, then sq = xxxx and si = xl23xl23. 

We let the final sequence s be any Si such that ki > n. This describes the 
construction. The basic useful properties of s are the following: 

Lemma 2 (1) Each Si has length kg2 l and references ki distinct items. 

(2) Any item r introduced in the ith inductive step (building Si+\) has peri- 
odicity fco2* in s. That is, for some j with 1 < j < fco2 l , the positions in s at 
which r is requested are j, j + k$2 l , j + 2 • fco2\ j + 3 • fco2*, .... 

(3) For each i, each length-kg2 t contiguous subsequence of s references ki 
distinct items. 

Proof: Properties (1) and (2) above are easy to verify by induction. Property 
(3) follows from properties (1) and (2). In particular, in each length-fco2 l con- 
tiguous subsequence of s, each item of periodicity fco2 J (for j < i) is requested 
2 J_I times, and each other request is to an item of periodicity larger than fco2 l 
that is requested only once in the subsequence. Since each lcngth-fco2 l contigu- 
ous subsequence has this structure, each such subsequence references the same 
number of distinct items as the string Si — that is, ki distinct items. o 
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Using these properties, we show the following: 

Lemma 3 Suppose n is larger than 4c/(l — 8). Using any cache size k such 
that ko < k < n, the fault rate of Fwf on s is more than c times that of Lru. 

Proof: In the construction of Sj_|_i from s$, we were careful to leave the first 
special request x in s,; alone. This ensures that each fc^-phase of s is of length 
ko2 l and starts with a symbol of periodicity greater than fco2\ 

From these properties it is easy to calculate the fault rates of Fwf using a 
cache of size ki on s. The fault rate of Fwf is ki/(ko2 l ) — each /c^-phase has 
length ko2 z and causes ki faults. 

The fault rate of Lru can be calculated using the following observation. Lru 
with a cache of size ki faults on exactly those items of periodicity greater than 
fco2\ This is because Lru evicts an item r exactly when there have been ki other 
distinct items requested since the last request to r, and we know (property (3)) 
that between two requests of any item r with periodicity ko2 J there are kj — 1 
distinct items (other than r) requested. 

We can count the frequency of requests to items with periodicity greater 
than ko2 l as follows. Consider any contiguous subsequence of length ko2 t+1 . 
Let a and b be the first and second half of the subsequence, respectively (each 
of a and b has length ko2 l ). We know that there are ki distinct items requested 
in a, and fej+i distinct items requested in ab. But the items requested in b that 
are not requested in a are exactly the items of periodicity greater than ko2 z . 
Thus, there are fcj+i — ki such items in b. As each is requested exactly once in 
b, the frequency of such requests (and the fault rate of Lru with a cache of size 
h) is (ki+i - ki)/(k 2 l ). 

Thus, for any i, using a cache of size ki, the ratio of the fault rate of Fwf 
to that of Lru is 

ki/ (fci+1 — ki). 

An easy calculation (using the assumption n > 4c/(l — 8)) shows this is at least 
2c. 

What about any k such that ki < k < ki+\ for some i? We know that Fwf 
faults k times in each fc-phase. The number of fc-phases is at least the number of 
fci + i-phases, i.e. at least |s|/(fco2 i+1 ). Thus, the fault rate is at least ki/(ko2 l+1 ) 
- half the fault rate of Fwf with a cache of size ki. For LRU, the fault rate 
with a cache of size k is at most the fault rate with a cache of size ki . Together 
these facts imply that (for any k such that ki < k < ki-\-% for some i), using a 
cache of size k, the ratio of the fault rate of Fwf to that of Lru is at least half 
the ratio when using a cache of size fc,. Thus, the ratio is greater than c. o 

To finish the proof of Theorem ||, we need to show that the fault rate of Fwf 
remains above e for all k such that ko < k < n. Reasoning as in the previous 
proof, the fault rate of Fwf with such a cache size k is at least ki / (ko2 l+1 ) for 
some i where ki < n. So we need to show ki/(k^2 l+1 ) > e if ki < n. In fact, we 
show the stronger result that 1/2 J+1 > e. 

The rest is algebra. In the following we will use the inequalities 1 + x > 2 X 
for x < 1 and 1 - x > 2- 2x for x < 1/2. 
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That ki < n implies that i < 85c by the following argument. (Each line 
follows from the line before it by the reason given.) 

ki < n given 

(1 — 5)n(l + l/4c) 4 < n definition of fcj, and x < \x] 

2~ 25 2 2 / 4c < 1 inequalities mentioned above 

i < 85c algebra 

Using this we will show l/2 i+1 > e, which implies fci/(fco2 i+1 ) > e. 

85c < log 2 (l/2e) definition of c 

i < l°g2(V2e) i < 85c (proven above) 
l/2 l+1 > e " algebra 

This concludes the proof of Theorem ||. o 

We can modify Fwf so that it doesn't evict all items from the cache at 
the beginning of the phase, but instead evicts the 0-credit items (those not yet 
request this phase) one at a time but pessimally — in the order that they will 
be next requested. The modified algorithm only evicts one page at a time, but, 
since it still incurs k faults per fc-phase, the proof of Theorem |^ applies to the 
modified algorithm as well. The modified algorithm is also a special case of 
Landlord. Thus, the lower bound applies to Landlord even if Landlord 
is constrained to evict only as many items as necessary to handle the current 
request. 



5 Further Directions 

A main open question here seems to be to more tightly characterize the loose 
competitiveness of LRU. A reasonable goal would be to find a non-trivial lower 
bound or an upper bound better than the one implied in this paper. The latter 
would show that LRU is better than Fwf in this model. It would also be nice 
to characterize the relative loose competitiveness of LRU and first-in-first-out 

(FIFO). 

Another direction is to find a non-trivial lower bound for the randomized 
marking algorithm for paging. Finally, the lower bounds in this paper apply 
to particular on-line algorithms; what lower bounds can be shown for arbitrary 
deterministic on-line algorithms, or for arbitrary randomized on-line algorithms? 
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